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On the Optimality of Myopic Sensing 
in Multi-channel Opportunistic Access: 
the Case of Sensing Multiple Channels 
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Abstract 



Recent works (iffl, EJ) have developed a simple and robust myopic sensing policy for multi-channel oppor- 
tunistic communication systems where a secondary user (SU) can access one of N i.i.d. Markovian channels. 
f-H | The optimality of the myopic sensing policy in maximizing the SU's cumulated reward is established under certain 



conditions on channel parameters. This paper studies the generic case where the SU can sense more than one channel 
each time. By characterizing the myopic sensing policy in this context, we establish analytically its optimality for 
\ certain system setting when the SU is allowed to sense two channels. In the more generic case, we construct 

counterexamples to show that the myopic sensing policy, despite its simple structure, is non-optimal. 

! Index Terms 

o 

1— I ■ 

Opportunistic spectrum access (OSA), myopic sensing policy, partially observed Markov decision process 
(POMDP), restless multi-armed bandit problem (RMAB) 

I. Introduction 

The concept of opportunistic spectrum access (OSA), first envisioned by J. Mitola in the seminal paper 
||3l on the software defined radio systems, has emerged in recent years as a promising paradigm to enable 
more efficient spectrum utilization. The basic idea of OSA is to exploit instantaneous spectrum availability 
by allowing the unlicensed secondary users (SU) to access the temporarily unused channels of the licensed 
primary users (PU) in an opportunistic fashion. In this context, a well-designed channel access policy is 
crucial to achieve efficient spectrum usage. 

The authors are with the Laboratoire de Recherche en Informatique (LRI), Department of Computer Science, the University of Paris-Sud 
XI, 91405 Orsay, France (e-mail: {Kehao.Wang, Lin.Chen}@lri.fr). 
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Fig. 1. The Markov channel model 



In this paper, we consider a generic OSA scenario where there are N slotted spectrum channels partially 
occupied by the PUs. Each channel evolves as an independent and identically distributed (i.i.d.), two-state 
discrete-time Markov chain. As illustrated in Fig. [1] the two states for each channel, busy (state 0) and 
idle (state 1), indicate whether the channel is free for an SU to transmit its packet on that channel at a 
given slot. The state transition probabilities are given by {Pij},i,j = 0, 1. An SU seeks a sensing policy 
that opportunistically exploits the temporarily unused channels to transmit its packets. To this end, in each 
slot, the SU selects a subset of channels to sense based on its prior observations and obtain one unit as 
reward if at least one of the sensed channel is in the idle state, indicating that the SU can effectively 
send one packet using the idle channel (or one of the idle channels) unused by PUs in current slot. The 
objective of the SU is to find the optimal sensing policy maximizing the reward that it can obtain over a 
finite or infinite time horizon. 

As stated in [1], the design of the optimal sensing policy can be formulated as a partially observable 



Markov decision process (POMDP) El, or a restless multi-armed 



:>andit problem (RMAB) [5], of which 



Unfortunately, obtaining the optimal 



the application is far beyond the domain of cognitive radio systems^ 
policy for a general POMDP or RMAB is often intractable due to the exponential computation complexity. 
Hence, a natural alternative is to seek simple myopic policies for the SU. In this line of research, a myopic 
sensing strategy is developed in [|T) for the case where the SU is limited to sense only one channel at 
each slot. The myopic sensing policy in this case is proven to be optimal when the state transitions of the 
Markov channels are positively correlated, i.e., pn > poi 0. 

In this paper, we naturally extend the proposed myopic policy in the generic case where the SU can 
sense more than one channel each time and gets one unit of reward if at least one of the sensed channels 
is in the idle state. Through mathematic analysis, we show that the generalized myopic sensing policy is 

'Please refer to JTJ , J2) for more examples where this formulation is applicable. A summary on the related works on the analysis of this 
problem using the POMDP and RMAB approaches are presented in J2)- We thus do not provide a literature survey in this paper. 
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optimal only for a small subset of cases where the SU is allowed to sense two channels each slot. In the 
general case, we give counterexamples to show that the myopic sensing policy, despite its simple structure, 
is not optimal. We believe that our results presented in this paper, together with [fl], [0, [0, lead to more 
in-depth understanding of the intrinsic structure and the resulting optimality of the myopic sensing policy 
and will stimulate more profound research on this topic. 

Before concluding the Introduction section, it is insightful to compare our results obtained in this paper 
with that presented in [6] on the similar problem. In [61, the authors show that when p n > p 01 holds, the 
myopic sensing policy is optimal even for the case where the SU senses more than one channel each slot. 
These two results seem to be contradictory. In fact, this is due to the fact that in (61, the objective of the 
SU is to find as many idle channels as possible, thus maximizing the throughput under the condition that 
the SU can transmit on all the idle channels. In contrast, our results are focused on the scenario where 
the SU can only transmit on one channel. As a result, the SU aims at maximizing the probability to find 
at least one idle channel. It is insightful to notice that this nuance on the model (more specifically on the 
utility function) indeed leads to totally contrary results, indicating that more research efforts are needed 
to understand the intrinsic characteristics of the myopic policy. In fact, we are currently investigating the 
forms of utility function under which the myopic policy can be optimal. 

The rest of this paper is structured as follows. Section |D] formulates the optimization channel sensing 
problem for the SU and presents the myopic sensing policy in the generic case. Section UIjJ studies the 
optimality of the myopic sensing policy, with Subsection IIII-AI establishing the optimality of the myopic 
sensing policy for a subset of scenarios when the SU is allowed to sense two channels each time, and 
Subsection IIII-BI illustrating the non-optimality of the myopic sensing policy for the general case through 
two representative counterexamples. Finally, the paper is concluded in Section ITVl 

II. Problem Formulation 

As explained in the Introduction, we are interested in a synchronously slotted cognitive radio network 
where an SU can opportunistically access a set J\f of N i.i.d. channels partially occupied by PUs. The 
state of each channel i in time slot t, denoted by Si(t), is modeled by a discrete time two-state Markov 
chain shown in Fig 1 . At the beginning of each slot t, the SU selects a subset A(t) of channels to sense. 
If at least one of the sensed channels is in the idle state (i.e., unoccupied by any PU), the SU transmits 
its packet and collects one unit of reward. Otherwise, the SU cannot transmit, thus obtaining no reward. 



These decision procedure is repeated for each slot. The focus of our work is to study the optimal sensing 
policy of the SU in order to maximize the average reward over T slots. Let A = {A(t), 1 < t < T}, the 
optimization problem of the SU Psu, when the SU is allowed to sense k channels, is formally defined as 
follows 2 : 



P su ■ max 

A{t)CM, \A{t)\=k, 0<i<T- 



1 T 

-Y 

t=i 



l- n 



(1) 



where cjj(t) is the conditional probability that Si(t) = 1 given the past actions and observations!} Based 
on the sensing policy A(i) in slot t and the sensing result, {coi(t),i G A/"} can be updated using Bayes 
Rule as shown in ©■ 

Pn, ieA(t),Si(t) = l 

Poi, ie^(*),5' i (t) = > (2) 
r(wi(t)), zg.A(t) 



Wi(t+1) = < 



where r(coi(t)) = cJi(t)pn + [1— Wj(t)]jPoi characterizes the evolution of the believe value of the non-sensed 
channels. 

As argued in the Introduction, the optimization problem Psu is by nature a POMDP, or a restless multi- 
armed bandit problem, of which the optimal sensing policy is in general intractable. Hence, a natural 
alternative is to seek simple myopic sensing policy, i.e., the sensing policy maximizing the immediate 
reward based on current believe. In this line of research, a myopic sensing policy is developed in flTJ for 
the case where the SU is limited to sense only one channel at each slot. The goal of our work presented 
in this paper is to study the optimality of the myopic sensing policy in the generic case where the SU can 
sense multiple channels each time. To this end, we first derive the structure of the myopic sensing policy 
for the general case and then provide an in-depth analysis on its optimality in Section Hill 

Definition 1 (Structure of Myopic Sensing in Generic Case). Sort the elements of the belief vector in 
descending order such that u)\{t) > Uzit) > • • • > uiN(t), the myopic sensing policy in the generic case, 
where the SU is allowed to sense k channels, consists of sensing channel 1 to channel k. 

The myopic sensing policy is easy to implement and maximizes the immediate payoff. In the next section, 

2 The more generic utility function can be formed by integrating the discount factor and allowing T = +00. By slightly adapting the 
analysis in this paper, our analysis can be extended there. 

3 Wi(0) can be set to p ^ p if no information about the initial system state is available. 



we show that the myopic sensing policy is optimal for the case k = 2 and T = 2 when pn > p i an d when 
Pn < Poi and N < 4. Beyond this small subset of parameter settings, we show that the myopic sensing 
policy, despite its simple structure, is not optimal by constructing two representative counterexamples. 

III. Optimality of Myopic Sensing Policy 

In this section, we study the optimality of the myopic sensing policy for the generic case (k > 2). More 
specifically, we structure our analysis into two cases: (1) T = 2, k — 2; (2) the general case. 

A. Optimality of myopic sensing policy when T = 2 and k = 2 

This subsection is focused on the case where the SU is allowed to sense two channels each slot and 
aims at maximizing the reward of the upcoming two slots. In terms of user behavior, this case models a 
short-sighted SU. The following two theorems study the optimality of the myopic sensing policy in this 
case for p u > p i and p u < p i, respectively. 

Theorem 1 (Optimality of Myopic Sensing Policy for T = 2 and k = 2: pn > p i)- In the case where 
T = 2 and k = 2, the myopic sensing policy is optimal when pu > p m . 

Proof: We sort the elements of the believe vector at the beginning of the slot t [oJi(t), u)2(t), ■ ■ ■ , wjv(i)] 
in descending order such that ui> uj 2 > • ■ ■ > waO. Under this notation, we can write the reward of the 
myopic sensing policy (i.e., sensing channel 1 and 2), denoted as R* , as 

R * = i _ (i _ Wl )(i _ W2 ) +UlU2 [l - (1 - pil )(i - p 11 )]+u 1 (l - u> 2 )[l - (1 - pu)(l - t(cj 3 ))} + 

V v ' V v ' V v ' 

A B C 

(1 - Wl )w2[l - (1 - p n )(l - r(w 3 ))] + (1 - - u 2 )[l - (1 - r(w 3 )(l - F))}, (3) 

11 v ' " v ' 

D E 

where t(cu) = upn + (1 — ^)poi is defined in ©, F = p i when = 3 and r(w 4 ) when > 4. More 
specifically, term A denotes the immediate reward in the upcoming slot t; term B denotes the expected 
reward of slot t + 1 when both channels are sensed to be free; term C (term D, respectively) denote the 
expected reward of slot t + 1 when only channel 1 (channel 2) is sensed to be free; term E denotes the 
expected reward of slot t + 1 when both channels are sensed to be busy. 

The proof of Theorem [T] consists of showing that sensing any two channels {i,j} ^ {1,2} cannot bring 
the SU more reward. We proceed the proof for the following two cases: 

4 For the simplicity of presentation, by slightly abusing the notations without introducing ambiguity, we drop the time slot index of u)i{t). 
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• {i,j} is partially overlapped with {1,2}, i.e., {«,j}P){l,2} ^ 0; 
. {i,j} is totally distinct to {1,2}, i.e., {i,j}f){l,2} = 0; 

Case 1. When {i,j} is partially overlapped with {1,2}, without loss of generality, assume that i = 1 
and j > 3, we can derive the upper bound of the expected reward of sensing the channels {i,j} = {1, j}, 
as shown in equation © (when j = 3) and equation © (when j > 4). Here by upper bound we mean 
that the SU senses channel i and j in slot t and the two channels with the largest idle probabilities for 
slot t + 1, leading to the maximal reward that the SU can achieve. 

When j = 3, following the similar analysis as that in ©, the utility upper bound when sensing the 
channels {i,j} = {1,3}, denoted by Ri, can be derived as follows: 

Rt = 1 - (1 - wi)(l - Uj) + w lWi [l - (1 - pn)(l - pu)] + wi(l - - (1 - pn)(l - r(wa))]+ 

(1 - wO^-fl - (1 - Pll )(l - r(w 2 ))] + (1 - Wl )(l - Wj )[l - (1 - r(w 2 )(l - p i))]. (4) 

After some algebraic operations, we obtain 

R* - R x = (1 - Wl )(w 2 - w 3 )(l - (1 -Ph)(F -Poi)), (5) 

where F is defined in ©. Noticing that p i < r(cOi) < pn,Vz G A/" following ©, it holds that F > poi. 
Hence 72* - i?i > holds for j = 3. 

When j > 4, the utility upper bound can be derived as: 

R x = 1 _ (1 _ Wl )(l _ Wi ) + UlU .[l - (1 _ pn )(l _ pu )] + _ u .)[l _ (1 _ pn )(l _ rM)] + 

(1 - wO^-fl - (1 - pu)(l - tM)] + (1 - wi)(l - ^-)[1 - (1 - r(wa)(l - r(w 3 )))]. (6) 
It follows that 

R* — R\ = wi(l - w 2 )(w 3 - ^)(Pn - Poi) + (1 - wi)(r(w 2 ) - r(Wj))(w 2 (l - Pn) + 

(1 - r(w 3 ))(l - wa)) + (1 - wOCrCwa) - r(^))(l - (1 - pu)(r( W3 ) - p 01 )), (7) 

It follows from the definition of t(qj) after © that when p 01 < p u , r(co) is increasing in u and that 
Poi < r (^3) < Pu- Hence i?* — i?! > holds for j > 4, too. 

The above results show that any other sensing policy cannot outperform the myopic sensing policy in 
this case. 

Case 2. When {i,j} is totally distinct to {1, 2}, implying N > 4, we can write the reward upper bound 
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of the sensing policy denoted as R 2 , in ®: 

R 2 = 1 - (1 - WiXl - Wj) + UiUj[l - (1 - pu)(l - pii)] + - " (1 " Pii)(l - r(w 1 ))] + 

(1 - Wi )wi[l - (1 - Pii)(l - r(wi))] + (1 - - WjOfl - (1 - r(wi)(l - r(u; 2 ))]. (8) 

We take i?i in © as an auxiliary to proceed our mathematic analysis. More specifically, comparing R 2 
to R\ in ©, after some algebraic operations, we have 

R 1 - R 2 = (1 - - Wi) + - w<)(pn +pn - (Pn) 2 )+ 

(1 - W^bll^l - Wj) + (1 - Pi\){0Jit{0J 2 ) - T(ui)Ui)} + 

- pu)((l - Wi)r(£J2) - (1 - Wi)r(wi)) - Pn(wi - w.)]+ 
(1 - Wi )[-( Wl - w,) + (1 - r(a; 2 ))[(l - r(wi))(l - wj) - (1 - «t)(T(w,))]] 
> (1 -Uj)(ui -Ui) +uj j (u 1 - w<)(pu +pn - (pn) 2 ) + 
(1 - Wj)[pn(o;i - Wi) + (1 - pu)(ujiT(ui) - r(u 1 )uj i )] + 
cjj[(l - Pn)((l - wi)r(wi) - (1 - Wi)r(wi)) - Pu(wi - 
(1 - WjOh^i - Ui) + (1 - r(wa))[(l - r(wi))(l - Wi ) - (1 - Wi)(r(wi))]] 

= (1 - Wi )(wi - Wi)[pii + (1 - PuW + (1 - Pn)(l " 7-(w 2 ))] > 0. (9) 

It then follows that 

R*-R 2 = (R* - Rx) + (Rt - R 2 ) > 0, (10) 

meaning that sensing {i,j} cannot outperform the myopic sensing policy in this case, either. Combining 
the results of both cases completes the proof of Theorem [T] ■ 
The following theorem studies the optimality of the myopic sensing policy when pn < p i- The proof 
follows the similar way as that of Theorem [TJ and is thus omitted. 

Theorem 2 (Optimality of Myopic Sensing Policy for T = 2 and k = 2: pn < p i). In the case where 
T = 2 and k = 2, the myopic sensing policy is optimal when pu < poi far the system consisting of at 
most 4 channels (i.e., N < 4 J. 

The optimality of the myopic sensing policy derived in this subsection, especially when pu > p i, 
hinges on the fact that the eventual loss of reward in slot t + 1, if there is, is over compensated by 
the reward gain in the current slot t. However, this result cannot be iterated in the general case. On the 
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contrary, in the next subsection, we show that the myopic sensing policy may not be optimal by providing 
two representative counterexamples. 

B. Non-optimality of myopic sensing policy in general cases 

In this subsection, we show that the myopic sensing policy is not optimal for the general cases beyond 
those studied in Section UlI-AI by constructing two representative counterexamples. 

Counterexample 1 (k = 3, T = 2, N = 6, pn > poi). Consider a system with k = 3, T = 2, N = 6 
and pu > poi, the reward generated by the myopic sensing policy (sensing the 3 channels with highest 
elements in the believe vector at each slot, i.e., u)\, uo 2 , ui 3 ) is given by 

R* cl = 1 - (1 - cji)(1 - - uj 3 ) + WjWaWafl - (1 - Pn) 3 ] + 

[^1^2(1 - u 3 ) +cu 1 (l -u 2 )uj 3 + (1 - u;i)u; 2 u; 3 ][l - (1 -p u ) 2 (l -r(w 4 ))] + 

[^(1 - w 2 )(l - cu 3 ) + (1 - wi)w 2 (l - w 3 ) + (1 - wi)(l - w 2 )w 3 ][l - (1 - Pii)(l - r(w 4 ))(l - r(w 5 ))] + 

(1 - cjx)(1 - w 2 )(l - w 3 )[l - (1 - r(w 4 )(l - r(w 5 ))(l - r(w 6 ))]. (11) 

On the other hand, consider the sensing policy that senses the 2 highest elements and the forth highest 
element in the believe vector (i.e., u\, u 2 and oj 4 according ot our notation) for the current slot t and 
senses the highest 3 elements in the believe vector for slot t + 1, the reward generated by this policy is 

R cl = 1 - (1 - wi)(l - w 2 )(l - oj 4 ) + wiw 2 u; 4 [l - (1 - Pn) 3 ] + 

[o;io; 2 (l - w 4 ) +wi(l - w 2 )u; 4 + (1 - Wi)u; 2 u; 4 ][l - (1 -pn) 2 (l - r(w 3 ))]+ 

[wi(l - w 2 )(l - w 4 ) + (1 - wi)w 2 (l - cj 4 ) + (1 -wi)(l -w 2 )o; 4 ][l - (1 -pn)(l -r(w 3 ))(l -r(w B ))]+ 

(1 - Wl )(l - w 2 )(l - w 4 )[l - (1 - t(cj 3 )(1 - r(w 5 ))(l - r(w 6 ))]. (12) 

It can be calculated that under the setting [oji, uj 2 , ^3, a> 5 , o> 6 ] = [0.99,0.5,0.4,0.39,0.25,0.25], pn = 
0.5, poi = 0.3, it holds that i? cl — = 0.00005625 > 0. The myopic sensing policy is not optimal for 
this counterexample. 

Counterexample 2 (A; = 3, T — 2, N — 6, pn < p i)- In the case of pn < p i, k — 3, T — 2 and N = 6, 
the reward generated by the myopic sensing policy is: 

i? c * 2 = 1 - (1 - wi)(l - w 2 )(l - w 3 ) + loilo 2 lo 3 [1 - (1 - r(w 6 ))(l - r(w B ))(l - r(w 4 ))]+ 

[wio; 2 (l - w 3 ) +wi(l -w 2 )w 3 + (1 - cji)w 2 w 3 ][1 - (1 -Poi)(l -t(w 6 ))(1 -t(w 5 ))]+ 
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Ml -u 2 )(l -w 3 ) + (1 - wi)w 2 (l - w 3 ) + (1 - wi)(l -cj 2 )w 3 ][1 - (1 -Poi) 2 (l -r(w 6 ))]+ 

(1 - Wl )(l - w 2 )(l - w 3 )[l - (1 - p„i) 3 ]. (13) 
The reward generated by the strategy of sensing channels 1,2,4 is: 

R c2 = 1 - (1 - wi)(l - w 2 )(l - w 4 ) + oo^u^l - (1 - r(w 6 ))(l - t{u 5 )){1 - r(w 3 ))] + 

[wiw 2 (1 - w 4 ) + wi(l - w 2 )a;4 + (1 - Wi)u; 2 u; 4 ][l - (1 - Poi)(l - T M)(l - r(w 5 ))]+ 

[wi(l -w 2 )(l-w 4 ) + (1 - wi)w 2 (l - w 4 ) + (1 - wi)(l -u 2 )co 4 }[l - (1 -p i) 2 (l -r(w 6 ))]+ 

(1 - Wl )(l - w 2 )(l - w 3 )[l - (1 - Poi) 3 ]- (14) 

We have R c2 -R* c2 = 0.00002 > with the parameters [wi, u 2 , u 3 , u 4 , u 5 , u a ] = [0.99,0.5,0.4,0.39,0.25,0.25], 
Pu = 0.3, poi = 0.5. The myopic sensing policy is not optimal for this counterexample, either. 

Remark. The above counterexamples can serve as a basis to construct more general counterexamples for 
the case where T > 3. One such counterexample in the general case where T > 3 is to follow the sensing 
policy given in the counterexample that gives better two-slot reward and than follow the optimal sensing 
policy. As a result, the global reward of the constructed sensing policy outweighs that of the myopic 
sensing policy. 

We are now ready to state the major result in this paper. 

Theorem 3 (Non-optimality of Myopic Sensing Policy in General Case). The myopic sensing policy is 
not guaranteed to be optimal in the general case. 

Remark. To conclude this section, it is insightful to note that the major results of this paper on the 
optimality of the myopic sensing policy, stated in Theorem [fl Theorem |2] and Theorem [3] hinge on 
the fundamental trade-off between exploration, by sensing unexplored channels in order to learn and 
predict the future channel state, thus maximizing the long-term reward (e.g., term B, C, D, E in ©), 
and exploitation, by accessing the channel with the highest estimated idle probability based on currently 
available information (the belief vector) which greedily maximizes the immediate reward (e.g., term A 
in ©). For a short-sighted SU (T = 1 and T = 2), exploitation naturally dominates exploration (i.e., the 
immediate reward overweighs the potential gain in future reward) under certain system parameter settings, 
resulting the optimality of the myopic sensing policy in a subset of this scenario. In contrast, to achieve 
maximal reward for T > 3, the SU should strike a balance between exploration and exploitation. In such 
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context, the myopic sensing policy that greedily maximizes the immediate reward is no more optimal. 

IV. Conclusion 

In this paper, we study the optimality of the myopic sensing policy in the generic scenario of op- 
portunistic spectrum access in a multi-channel communication system where an SU senses a subset of 
channels partially occupied by licensed PUs. We show that the myopic sensing policy is optimal only for 
a small subset of cases where the SU is allowed to sense two channels each slot. In the generic case, we 
give counterexamples to show that the myopic sensing policy, despite its simple structure, is not optimal. 
Due to the generic nature of the problem, we believe that the results obtained in this paper leads to more 
in-depth understanding of the intrinsic structure and the resulting optimality of the myopic policy and will 
stimulate more profound research on this topic. 
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